Reinforcement Learning for Average Reward Zero-Sum Games
نویسنده
چکیده
We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Differential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparable frequencies and present an algorithm that converges to the optimal Q-function, given the observed play of the opponent.
منابع مشابه
Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve fo...
متن کاملR-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible...
متن کاملLearning in Markov Games with Incomplete Information
The Markov game (also called stochastic game (Filar & Vrieze 1997)) has been adopted as a theoretical framework for multiagent reinforcement learning (Littman 1994). In a Markov game, there are n agents, each facing a Markov decision process (MDP). All agents’ MDPs are correlated through their reward functions and the state transition function. As Markov decision process provides a theoretical ...
متن کاملA Nested Family of $k$-total Effective Rewards for Positional Games
We consider Gillette’s two-person zero-sum stochastic games with perfect information. For each k ∈ Z+ we introduce an effective reward function, called k-total. For k = 0 and 1 this function is known as mean payoff and total reward, respectively. We restrict our attention to the deterministic case. For all k, we prove the existence of a saddle point which can be realized by uniformly optimal pu...
متن کاملLearning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games
A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004